This assignment is for ETC5521 Assignment 1 by Team bilby comprising of Yuheng Cui and Jimmy Effendy.

1 Introduction and Motivation

Families and friends tend to spend their holidays and weekends in amusement parks. The popularity of amusement park has been growing in popularity in recent years, with worldwide attendance at the top 10 amusement park groups reached half a billion mark for the first time last year (Schneider 2019), with 4% year-on-year growth (Index and Index/AECOM 2019).

With this ever increasing popularity, it is therefore reasonable to consider that the safety of amusement rides is subject to substantial public interests (Woodcock 2014). It was estimated that the annual number of ride-related injuries in North America was 1,289 in 2018; which was 26% higher compared to 2017 (Amusement Parks and Attractions 2018). International Association of Amusement Parks and Attractions further stated that approximately 11% of those injuries are “serious”, meaning that they result in urgent admission and hospitalization for more than 24 hours for non-medical observation reasons, or causes fatality.

While accidents that occur in amusement parks are arguably not frequent, they generate prominent effects when they happen. International Association of Amusement Parks and Attractions (2017) reported that following a ride malfunction in Australia killing four people, the park and also other venues in Australia suffered considerable declines in attendance. This shows that public confidence, safety, and commercial feasibility are strongly interconnected (Woodcock 2014).

Thorough evaluations to injury records related to amusement rides are aligned with public interests and are integral to encourage constant improvement in the industry. This paper aims to find factors that have major influence to amusement ride accidents. Uncovering this insight may encourage amusement park owners to utilize their resources more wisely in favor of those groups or equipment that are at the most risk. In addition, this report may also facilitate regulatory bodies when safety standards and regulations are developed.

In addition, this study intends to find out the patterns within injured groups. The patterns refer to the high-risk groups, high-risk equipment and seasonal trend of injuries. We believe amusement park must make the effort to reduce the incidents; the effort refers to maintaining equipment, training staff and monitoring visitors. At same time, visitors should also follow the parks regulations and equipment instructions. After all, visitors should be responsible for their own safeties in the first place. Reduce the injuries occur in amusement parks and increase the well beings of society.

Firstly, description of data used in the report and how it is prepared for analysis will be discussed. Then, analysis and findings acquired from the data set will be presented and discussed.

2 Data Description

The datasets are downloaded from here from GitHub. There are two datasets provided in the repository; one originated from data.world and another from Saferparks database. An additional data from Texas Department of Insurance (TDI) about current available insurance policies is used for this report (TDI 2020a).

The data.world data set originated from TDI (Millerbernd 2018). It is a record of any injuries caused by an amusement ride from February the 1st 2013 to February the 1st 2017 occurring in the State of Texas (Millerbernd 2018). An amusement ride is “any mechanical, gravity, or water device or devices that carry or convey passengers along, around, or over a fixed or restricted route or course or within a defined area for the purpose of giving its passengers amusement, pleasure, or excitement” (Insurance 2019). According to TDI (2019), a quarterly injury report needed to be submitted by amusement ride owners and operators. TDI further stated that this record relates to any injuries that require medical treatment or result in death.

This data set has 542 number of observations and 13 number of variables. Fig 2.1 shows the data set has too many missing values in other variable and many missing values in serial_no variable. Good news is that these two variables has little effects on our analysis, thus, we can remove these two variables. Age should be converted to numeric variable and injury_date should be date variable.

Texas amusement parks dataset

Figure 2.1: Texas amusement parks dataset

The accidents records in the Saferparks data set, on the other hand, originated from the U.S. State and Federal safety agencies regulating amusement rides (Saferparks 2020). Saferparks achieved this by submitting public records requests to those agencies (both federal and states agencies). In some cases, additional requests were submitted to specific agencies to achieve particular goals (Saferparks 2020). As a result, Saferparks needs to harmonize these data sets into their database.

It has 8351 observations and 23 variables. Fig 2.2 shows 4 variables have too many missing values to be analysed, therefore, we ought to ignore them. acc_date should be converted to date variable. Although manufacturer has many missing values as well, we decided to keep it because it may be insightful.

Saferparks dataset

Figure 2.2: Saferparks dataset

2.1 Data Limitation

As documentations related to accident reports provided by TDI are limited, it is difficult to determine the limitation related to the data set. In addition to a considerable amount of missing values in some of the variables, data dictionary is not provided by TDI. As a result, a fair amount of guesstimates were required for some of the variables provided. Lastly, there is some inconsistency of format in injury_date variable.

According to Saferparks (n.d.), reporting criteria and its level of details, types of equipment included, and years covered vary widely across year, industry sector, jurisdictions, and other factors. Saferparks further stated that States that are transparent, vigilantly monitor safety incidents, and implement data management systems that are efficient will log higher number of accidents. In other words, having high number of injuries may be an indications of being more attentive to safety, not less (Saferparks, n.d.).

While the data set can be used to uncover insights of how patrons got hurt in amusement rides, Saferparks do not recommend the data set to be used for comparison across states, parks, rides or years (Saferparks, n.d.). One of the reasons for this is that State laws in relation to amusement ride related injury reporting vary widely. For instance, it is mandatory to report go-kart accidents in Flordia but not in California.

Due to these limitations, the report will not use the Saferparks data set to analyze nation-wide patterns. Hence, this report will largely focus on amusement park accidents that occur in Texas, United States of America.

2.2 Data Cleaning and Transformation

A considerable amount of data wangling needed to be done to the TDI injury data sets prior to the analysis. Firstly, there were inconsistencies of format in the date variable where some of the observations were stored in a serial number format that only Excel recognizes (e.g. 39448). Dates with this format needed to be converted to “YYYY-MM-DD” format.

Secondly, the following variables were added to the TDI injury data set:

  • injury_year: the year when the observed injury occurred
  • injury_month: the month when the observed injury occurred
  • injury_day: the day when the observed injury occurred
  • season: the season (U.S.) when the observed injury occurred

TDI data set about current insurance policies were also needed to be cleaned. Firstly, the janitor package was used to make the column names tidy. Next, the agent variable were needed to be wrangled as there were many observations that were misspelled.

Finally, the TDI injury data set were combined with TDI insurance policies data set.

Much of the data wrangling and transformation process were done by utilizing the dplyr and lubridate packages.

3 Analysis and findings

3.1 Primary Question

What are The high risk groups?

The primary question in this report is to find out the high-risk group. Consequently, help parks to target these groups and also remind the groups taking care of their own. We mainly analyse the age and gender for the primary quesiton. The Saferparks dataset cannot be used here, because the dataset does not have detail individual records; therefore, we do not know the ages of each injuried.

First, we need to clean the data. The missing values in age and gender is removed and the variable type is converted from character to numeric. Second, we unify the records in gender because some observations record gender male as ‘M’ while others record as ‘m’. Third, calculate the percentage of each age group.

Figure 3.1: Bar plot for age

Table 3.1: Top 10 risky age group in amusement parks in Texas
Age Percent
0 9.696970
11 3.838384
12 3.838384
16 3.838384
8 3.232323
13 3.030303
14 3.030303
40 3.030303
10 2.828283
7 2.626263
Table 3.2: Top 10 risky group by gender in amusement parks in Texas
Gender Age Percent
F 0 5.66
M 0 4.04
F 11 2.22
F 14 2.22
F 16 2.22
M 12 2.22
F 40 2.02
M 5 1.82
M 8 1.82
F 12 1.62

Table 3.1 shows the ranking of injuried ages. Injuried babies occupies nearly 10% of the dataset. In the dataset 38.99 percent injuried are under 18, except babies. We distinguish babies from other age groups because babies are carried by their parents and babies are not able to move to any places by their own.

If we take gender into account, we can see gender distributions by age. Table 3.2 shows the top-10 rankings by gender and age. In top-10 list, it seems that more girls get injuried than boys in amusement parks in Texas. However, if we look at the whole dataset, age 18, 18.97 percent are girls while 20.02 are boys.

In summary, the age range of the injured is broad (from 0 to 71). But the most injured are young people, especially for children (under 18). And the fact that almost 10 percent of the injured are babies indicates that parents must take care of their babies in amusement parks. They should put their babies in first priority.

3.2 First Secondary Question

What is the most dangerous equipment in parks and what is the most body parts injured?

First secondary question is “what is the most dangerous equipment in parks”. We count the total number of injuries by device type. Table 3.3 shows top-10 rankings of high-risk equipment. It is not surprising that roller coaster is the equipment that causes the most injuries. Gutierrez (2016) reported that seven cases are related to roller coaster among eight high-profile U.S. amusement park deaths before 2016.

Table 3.3: Top 10 risky device type
Device Type Total Number of Injuries
Coaster - steel 879
Trampoline court 678
Go-kart 648
Tube slide 519
Aquatic play area 337
Coaster - wooden 227
Body slide 204
Flume ride 183
Water slide - undefined 176
Bowl slide 175
Injury description --- word cloud

Figure 3.2: Injury description — word cloud

First, We create a new list containing common workds, such as injury, injuries and pain. Second, we remove unimportant words from injury description. Finally, the word cloud is created. We can see that within the word cloud (Fig 3.2) bigger the word more often it appears. Head may be the most frequent word; so, we may assume that most people get hurt on their heads.

In summary, in high proportion of injury cases, the injured get hurt of their upper half of the body. amusement parks should pay extra attention on roller coaster because unlike other subjects, roller coaster’s failure can cause severe consequence — no one can escape once a roller coaster is launched. First, they should regularly maintain the roller coaster, in order to reduce mechanical failure. Second, train the staff and ask them to check to-do list every time launch the equipments. Third, ask visitors to follow the safety guide. Furthermore, visitors should also protect their head, neck and shoulder. Those body parts are vulnerable and important and they are always injuried in amusement park injury cases.

3.3 Second Secondary Question

Does amusement ride injuries have seasonal characteristics?

This section will examine whether seasonal trends affected number of amusement ride injuries across the year. Table 3.4 shows that rides related injuries have seasonal trends and they are consistent across the years. The number of injuries occurred in autumn and winter seasons were relatively low. The number started to rise in spring, and reached its peak in summer.

Table 3.4: Amusement Rides Related Injuries across Years and Seasons
Season 2013 2014 2015 2016 2017
Autumn 11 6 9 7 6
Spring 30 36 20 23 17
Summer 81 55 89 64 45
Winter 2 5 4 1 2

Figure 3.3 shows how the number of injuries occurred in amusement rides distributed across months and years. It is reflected in the graph that the highest number of injuries that occurred in a single month appeared in June 2015 with 41 injuries. Another interesting feature that appears in the graph is that the numbers of rides related injuries in 2014 and 2017 are relatively low compared to other years.

It may be beneficial for ride owners to focus their resources in spring and summer when number of injuries are at its height. More regular and rigorous inspections to the ride equipment can be performed during these periods. Furthermore, ride owners may also provide staff with additional training in the periods leading up to summer. It may also be advantageous to perform further study to uncover the true drivers of the following questions (which unfortunately are out of scope of this report):

  • Why did 2014 and 2017 have comparatively low number of injuries?
  • Why did June 2015 have such a high number of injuries?

Figure 3.3: Ride Related Injuries Seasonal Trends

3.4 Third Secondary Question

What are the effects of insurance company to number of rides related injuries?

According to TDI (2020b), every amusement ride in Texas is required to display a compliance sticker. To get these stickers, rides must be insured and inspected (TDI 2020b). TDI, however, are not involved with rides inspections. Instead, it is the ride owner’s insurance company that are responsible to perform these inspections. This section will explore whether number of injuries vary across insurance agents responsible in inspecting the quality of rides. A summary table of accidents occurred by insurance agents were constructed by using the combined table of TDI injuries and insurance policies data set. Table 3.5 shows that the majority of rides related injuries occurred in amusement parks that were overseen by Hartford Fire Insurance Company. The number is still larger compared to the number of injuries from the rest of the table combined.

Table 3.5: Top 5 Accidents by Insurance Agents
Insurance Agents Total Injuries
Hartford Fire Insurance Company 212
Everest National Insurance Company 96
Arch Insurance Company 45
ACE American Insurance Company 36
Everest Indemnity Insurance Company 31

It will also be useful to see the trends of these injuries over the year. Figure 3.4 shows that while the trends of total injuries fluctuate across the year, they had a downward trajectory in general. Hartford Fire Insurance Company, in particular, shows a constant and significant reduction in total ride related injuries across the year. In contrast, Everest Indemnity Insurance Company underwent a reverse trend where its number of ride related injuries rose every year.

It is important to note that having a high number of injuries is not necessarily a reflection of an insurance company being negligence. It may be the case that the parks overseen by the insurance company attracts much more guests compared to other parks. However, given the government is not responsible in overseeing the condition of the rides, it is important to see how the number of injuries distributed to each insurance agents. Further analysis may be warranted to uncover the drivers behind the variability of the number of injuries across insurance agents, as well as its trends across the year. For instance, it may be beneficial to investigate why the rides related injuries constantly decreases every year for Hartford Fire Insurance Company. Insights gained from this investigation have the potential to help the industry as well as the general public as it may minimize rides related accidents. Unfortunately, this is out of scope for this paper due to limitation of data availability.

Figure 3.4: Yearly Trend of Number of Rides Related Injuries in Texas

4 Acknowledgments

The following packages are used to produce this report: visdat (Tierney 2017), dplyr (Wickham et al. 2020), readr(Wickham, Hester, and Francois 2018), tidyverse (Wickham et al. 2019), lubridate (Grolemund and Wickham 2011), knitr (Xie 2014), kableExtra (Zhu 2019), tidytext (Silge and Robinson 2016), wordcloud (Fellows 2018), janitor (Firke 2020), here (Müller 2017), plotly (Sievert 2020)

References

Amusement Parks, International Association of, and Attractions. 2018. IAAPA RIDE SAFETY REPORT – NORTH AMERICA – 2018.

Fellows, Ian. 2018. Wordcloud: Word Clouds. https://CRAN.R-project.org/package=wordcloud.

Firke, Sam. 2020. Janitor: Simple Tools for Examining and Cleaning Dirty Data. https://CRAN.R-project.org/package=janitor.

Grolemund, Garrett, and Hadley Wickham. 2011. “Dates and Times Made Easy with lubridate.” Journal of Statistical Software 40 (3): 1–25. http://www.jstatsoft.org/v40/i03/.

Gutierrez, Lisa. 2016. “Eight High-Profile U.s. Amusement Park Deaths in Recent Years.” The Kansas City Star. The Kansas City Star. https://www.kansascity.com/news/nation-world/national/article94407457.html.

IAAPA. 2017. Global Theme and Amusement Park Outlook 2017–2021.

Index, Theme, and Museum Index/AECOM. 2019. TEA/AECOM 2019 Theme Index and Museum Index: The Global Attractions Attendance Report.

Insurance, Texas Department of. 2019. “Amusement Ride Faqs.” https://www.tdi.texas.gov/commercial/lcamuseinfo.html#reports.

Millerbernd, Annie. 2018. “Texas Amusement Park Accidents,” March. https://data.world/amillerbernd/texas-amusement-park-accidents.

Müller, Kirill. 2017. Here: A Simpler Way to Find Your Files. https://CRAN.R-project.org/package=here.

Saferparks. 2020. “Accident Reports from State/Federal Regulators.” https://ridesdatabase.org/saferparks/data/.

Schneider, Mike. 2019. “Theme Park Attendance Crosses Half-Billion Mark for 1st Time.” U.S. News & World Report. https://www.usnews.com/news/best-states/florida/articles/2019-05-23/theme-park-attendance-crosses-half-billion-mark-for-1st-time.

Sievert, Carson. 2020. Interactive Web-Based Data Visualization with R, Plotly, and Shiny. Chapman; Hall/CRC. https://plotly-r.com.

Silge, Julia, and David Robinson. 2016. “Tidytext: Text Mining and Analysis Using Tidy Data Principles in R.” JOSS 1 (3). https://doi.org/10.21105/joss.00037.

TDI. 2020a. “Amusement Ride Current Insurance Policies.” https://www.tdi.texas.gov/commercial/lcamusepolicy.html.

———. 2020b. “Amusement Ride Requirements.” https://www.tdi.texas.gov/commercial/indexamusement.html.

Tierney, Nicholas. 2017. “Visdat: Visualising Whole Data Frames.” JOSS 2 (16): 355. https://doi.org/10.21105/joss.00355.

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.

Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2020. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.

Wickham, Hadley, Jim Hester, and Romain Francois. 2018. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.

Woodcock, Kathryn. 2014. “Amusement Ride Injury Data in the United States.” Safety Science 62: 466–74.

Xie, Yihui. 2014. “Knitr: A Comprehensive Tool for Reproducible Research in R.” In Implementing Reproducible Computational Research, edited by Victoria Stodden, Friedrich Leisch, and Roger D. Peng. Chapman; Hall/CRC. http://www.crcpress.com/product/isbn/9781466561595.

Zhu, Hao. 2019. KableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra.